Natural Language Identification using Corpus-Based Models

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken language identification using the speechdat corpus

Current language identification systems vary significantly in their complexity. The systems that use higher level linguistic information have the best performance. Nevertheless, that information is hard to collect for each new language. The system presented in this paper is easily extendable to new languages because it uses very little linguistic information. In fact, the presented system needs...

متن کامل

Unsupervised Natural Language Processing Using Graph Models

In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevail, while machine learning algorithms use implicit knowledge for generating linguistic knowledge. The question behind this work is: how far can we go in NLP without assuming explicit or implicit linguistic knowledge? Ho...

متن کامل

Web Text Corpus for Natural Language Processing

Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...

متن کامل

Corpus Design For Biomedical Natural Language Processing

This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: HERMES - Journal of Language and Communication in Business

سال: 2017

ISSN: 1903-1785,0904-1699

DOI: 10.7146/hjlcb.v7i13.25083